Goto

Collaborating Authors

 Francisco Morazán


Polls open in Honduras presidential election marked by fraud accusations

Al Jazeera

Hondurans are heading to the polls to elect a new president in a tightly contested race that is taking place amid concerns over voter fraud in the impoverished Central American country. Polls opened on Sunday at 7am local time (13:00 GMT) for 10 hours of voting, with the first results expected late Sunday night. The elections, in which the 128 members of Congress, hundreds of mayors, and thousands of other public officials will also be chosen, are taking place in a highly polarised climate, with the three top candidates accusing each other of plotting fraud. Moncada has suggested that she will not recognise the official results. Incumbent President Xiomara Castro of the LIBRE party is limited by law to one term in office.


Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation

Long, Yunbo, Xu, Liming, Brintrup, Alexandra

arXiv.org Artificial Intelligence

To evaluate the fidelity of synthetic tabular data, numerous metrics have been proposed to assess accuracy and diversity, including both low-order statistics (e.g., Density Estimation and Correlation Score (Zhang et al., 2023), Average Coverage Scores (Zein & Urvoy, 2022)) and high-order statistics (e.g., α-Precision and β-Recall (Alaa et al., 2022)). However, these metrics operate at a high level and fail to evaluate whether synthetic data preserves logical relationships, such as hierarchical or semantic dependencies between features. This highlights the need for a more fine-grained, context-aware evaluation of multivariate dependencies. To address this, we propose three evaluation metrics: Hierarchical Consistency Score (HCS), Multivariate Dependency Index (MDI), and Distributional Similarity Index (DSI). To assess the effectiveness of these metrics in quantifying inter-column relationships, we select five representative tabular data generation methods from different categories for evaluation. Their performance is measured using both existing and our proposed metrics on a real-world dataset rich in logical consistency and dependency constraints. Experimental results validate the effectiveness of our proposed metrics and reveal the limitations of existing approaches in preserving logical relationships in synthetic tabular data. Additionally, we discuss potential pathways to better capture logical constraints within joint distributions, paying the way for future advancements in synthetic tabular data generation.


MessIRve: A Large-Scale Spanish Information Retrieval Dataset

Valentini, Francisco, Cotik, Viviana, Furman, Damián, Bercovich, Ivan, Altszyler, Edgar, Pérez, Juan Manuel

arXiv.org Artificial Intelligence

Information retrieval (IR) is the task of finding relevant documents in response to a user query. Although Spanish is the second most spoken native language, current IR benchmarks lack Spanish data, hindering the development of information access tools for Spanish speakers. We introduce MessIRve, a large-scale Spanish IR dataset with around 730 thousand queries from Google's autocomplete API and relevant documents sourced from Wikipedia. MessIRve's queries reflect diverse Spanish-speaking regions, unlike other datasets that are translated from English or do not consider dialectal variations. The large size of the dataset allows it to cover a wide variety of topics, unlike smaller datasets. We provide a comprehensive description of the dataset, comparisons with existing datasets, and baseline evaluations of prominent IR models. Our contributions aim to advance Spanish IR research and improve information access for Spanish speakers.


On the Empirical Complexity of Reasoning and Planning in LLMs

Kang, Liwei, Zhao, Zirui, Hsu, David, Lee, Wee Sun

arXiv.org Artificial Intelligence

Chain-of-thought (CoT), tree-of-thought (ToT), and related techniques work surprisingly well in practice for some complex reasoning tasks with Large Language Models (LLMs), but why? This work seeks the underlying reasons by conducting experimental case studies and linking the performance benefits to well-established sample and computational complexity principles in machine learning. We experimented with 6 reasoning tasks, ranging from grade school math, air travel planning, ..., to Blocksworld. The results suggest that (i) both CoT and ToT benefit significantly from task decomposition, which breaks a complex reasoning task into a sequence of steps with low sample complexity and explicitly outlines the reasoning structure, and (ii) for computationally hard reasoning tasks, the more sophisticated tree structure of ToT outperforms the linear structure of CoT. These findings provide useful guidelines for the use of LLM in solving reasoning tasks in practice.


Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers

Vides, Fredy, Nogueira, Idelfonso B. R., Banegas, Lendy, Flores, Evelyn

arXiv.org Artificial Intelligence

The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a pivotal role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on their effectiveness in the identification and predictive simulation of equivariant nonlinear systems, regardless of whether such systems exhibit chaotic behavior.


Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Cyr, Eric C., Gulian, Mamikon A., Patel, Ravi G., Perego, Mauro, Trask, Nathaniel A.

arXiv.org Machine Learning

Despite their importance, such theorems offer no explanation for the advantages of neural networks, let alone deep neural networks, over classical approximation methods, since universal approximation properties are enjoyed by polynomials (Cheney and Light, 2009) as well as single layer neural networks (Cybenko, 1989). To address this, a recent thread has emerged in the literature concerning optimal approximation with deep ReLU networks, where the error in an optimal choice of weights and biases is bounded from above using the width and depth of the neural network. For example, using the "sawtooth" function of Telgarsky (2015), Y arotsky (2017) constructed an exponentially accurate (in the number of layers) ReLU network emulator for multiplication (x,y) null xy . This construction is used to obtain upper bounds on optimal approximation based upon DNN emulation of polynomial approximation. Building on these ideas, Opschoor et al. (2019) proved that optimal approximation with deep ReLU networks can emulate adaptive hp-finite element approximation, with greater depth allowing p -refinement to obtain exponential convergence rates. An additional contribution by He et al. (2018) reinterpreted single hidden layer ReLU networks as r -adaptive piecewise linear finite element spaces.


Distributed Correlation-Based Feature Selection in Spark

Palma-Mendoza, Raul-Jose, de-Marcos, Luis, Rodriguez, Daniel, Alonso-Betanzos, Amparo

arXiv.org Machine Learning

CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.


Distributed ReliefF based Feature Selection in Spark

Palma-Mendoza, Raul-Jose, Rodriguez, Daniel, de-Marcos, Luis

arXiv.org Machine Learning

Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving the processing algorithm's accuracy. However, traditional algorithms designed for executing on a single machine lack scalability to deal with the increasing amount of data that has become available in the current Big Data era. ReliefF is one of the most important algorithms successfully implemented in many FS applications. In this paper, we present a completely redesigned distributed version of the popular ReliefF algorithm based on the novel Spark cluster computing model that we have called DiReliefF. Spark is increasing its popularity due to its much faster processing times compared with Hadoop's MapReduce model implementation. The effectiveness of our proposal is tested on four publicly available datasets, all of them with a large number of instances and two of them with also a large number of features. Subsets of these datasets were also used to compare the results to a non-distributed implementation of the algorithm. The results show that the non-distributed implementation is unable to handle such large volumes of data without specialized hardware, while our design can process them in a scalable way with much better processing times and memory usage.


Radio Progreso: Honduran journalists under threat

Al Jazeera

In the Central American country of Honduras, a political story has been unfolding which deserves more coverage than it has been getting. Close to 40 people have been killed and more than 2,000 arrested, following the contested re-election of President Juan Orlando Hernandez to a second term in office. With 54 percent of the votes counted, the trend was a clear win for left-wing opposition candidate, Salvador Nasralla. But then the computer system mysteriously broke down. When it finally came back online a full day later, the vote count had been turned upside down: the right-wing incumbent, Juan Orlando Hernandez, was suddenly ahead.

  Country:
  Industry:

Russia launches facial recognition programme to find anyone's face on Twitter

The Independent - Tech

A Russian company has launched a programme that can identify a stranger among 300 million Twitter users in less than a second. The social media platform has responded to the new software, called "FindFace", saying it its use is in "violation" of its rules and it is taking the matter "very seriously". Trump'obviously aware' Russia behind election hacks, White House says Syria's Assad says Donald Trump will be Russia's'natural ally' Trump'obviously aware' Russia behind election hacks, White House says Syria's Assad says Donald Trump will be Russia's'natural ally' "We see lots of opportunities for Twitter users on the service," Artem Kukharenko, co-founder of NTechLab told BuzzFeed. "We think this is something many people will use," he added, claiming the technology could be used to reduce spam profiles. "Not in the US, but in other countries there is a real problem of politicians, reporters, finding that someone created a fake account for them. "I was involved back in Russia with scandals with a fake account posing as a politicians that tweeted something and created political scandal." he said. Christopher Weatherhead, Technologist at Privacy International said: "The software created by NTechLab highlights the ease to which cross-referencing profiles photos is possible.